NII , Japan at the first THUMOS Workshop 2013

نویسندگان

  • Sang Phan
  • Duy-Dinh Le
  • Shin’ichi Satoh
چکیده

We submit the results of our system to the THUMOS’13 Challenge on Recognition Task. We apply a multi-modal approach to recognize actions on UCF101 dataset. These features include Dense Trajectories (motion features), SIFT (image features) and MFCC (audio features). Moreover, we also use the Fisher vector encoding, which is a state-of-theart feature representation on popular image classification and action recognition datasets. 1. Classification Framework The action classification framework is shown in Figure 1. Basically, it consists of following steps: feature extraction, feature encoding and action classification. 1.1. Feature Extraction We use feature from different modalities to model action in UCF101: motion features, still image features, and audio features. We use the Dense Trajectories [7] with Motion Boundary Histogram (MBH) descriptor for capturing moving patterns in video. This is a state of the art motion feature that is suitable for realistic videos because it can suppress camera motions. We use the standard SIFT [4] with Hessian Laplace detector for extracting still image feature. Finally, the standard MFCC [3] feature is used to capture audio features. Audio segment unit is set to 25 ms and the overlapping parameter is set to 10 ms. The 13d MFCCs along with each first and second derivatives are used for representing each audio segment. 1.2. Fisher Vector Encoding Fisher Vector [5] representation is a newly developed feature encoding technique where the mean and variance of local descriptors that belong to each cluster are also considered. Therefore, Fisher vector encodes more information than the bag-of-words feature encoding. Following the standard implementation of Fisher vector, we use the codebook size of 256 clusters which are generated using the Gaussian Mixture Model (GMM). We further improve the expressiveness of Fisher vector by applying PCA for reducing feature dimension, i.e from 128d to 80d for SIFT and from 192d to 128d for MBH descriptor. 1.3. Action Classification We use the popular Support Vector Machine (SVM) for classification. All the positive videos are considered as positive samples and the remaining videos are considered as negative samples. LibSVM [1] is incorporated to our system for learning steps because it is a standard implementation for SVM. We use linear kernel for training features encoded by Fisher vector. Furthermore, we also utilized the pre-computed kernel technique to reduce the training time. This technique is especially useful when the number of action classes are large.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LEAR-INRIA submission for the THUMOS workshop

This notebook paper describes the submission of the LEAR team from INRIA to the THUMOS workshop in conjunction with ICCV 2013. Our system is based on the recent improvement of dense trajectory feature [14]. After extracting the local features, we apply Fisher vector to integrate them into a compact representation for each video. We also use spatio-temporal pyramids to embed structure informatio...

متن کامل

Message from the Organizing Chair

This volume is the formal record of more than a year of activities of the third NII Test Collection for Information Retrieval (NTCIR) Workshop. Based on two preceding successes of NTCIR-I in 1998-1999 and NTCIR-II in 2000, this third Workshop could have marked another milestone in the history of NII research activities. Since the foundation of the National Institute of Informatics (NII) in 2000...

متن کامل

The LEAR submission at Thumos 2014

We describe the submission of the INRIA LEAR team to the THUMOS workshop in conjunction with ECCV 2014. Our system is based on Fisher vector (FV) encoding of dense trajectory features (DTF), which we also used in our 2013 submission. This year’s submission additionally incorporates staticimage features (SIFT, Color, and CNN) and audio features (ASR and MFCC) for the classification task. For the...

متن کامل

Preface from the Organizing Chair

On behalf of National Institute of Informatics, I would like to express our gratitude for the efforts and the collaboration of those who participated in the fourth NII Test Collection for Information Retrieval (NTCIR-4) Workshop. This volume is the formal record of research activities of NTCIR-4 meeting, 2-4 June, 2004, in which the final reports on the participants’ research activities since J...

متن کامل

MindLAB at the THUMOS Challenge

In this notebook paper we describe the MindLAB research group participation at the THUMOS challenge held as part of the ICCV 2013 conference. Two runs were submitted using different methods (SVM, OMF) with the features provided by the challenge organizers (DTF). The performance obtained shows an improvement over the baseline method.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013